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ABSTRACT 


This paper proposes a Smartphone-based system for the detection of 
drowsiness in automotive drivers. The proposed system uses three- 
stage drowsiness detection technique. The first stage uses the 
percentage of eyelid closure (PERCLOS) which is obtained by 
capturing images with the front camera of the Smartphone with a 
modified eye state classification method. The system uses near- 
infrared lighting for illuminating the face of the driver during night 
driving. The second step uses the voiced to the unvoiced ratio (VUR) 
obtained from the speech data from the microphone, in the event 
PERCLOS crosses the threshold. The VUR is also compared with a 
threshold and if it is a value greater than that of the threshold, it 
moves on to the next verification stage. In the final verification stage, 
touch response is required within the stipulated time to declare 
whether the driver is drowsy or not and subsequently sound an alarm. 
To awake the driver, a vibrating mechanism is done and also the live 
GPS location is also sent to an emergency contact. We have studied 
eight other reference papers for the literature review. The system has 
three advantages over existing drowsiness detection systems. First, 
the three-stage verification process makes the system more reliable. 
The second advantage is its implementation on an Android smart- 
phone, which is readily available to most drivers or cab owners as 
compared to other general-purpose embedded platforms. The third 
advantage is the use of SMS service to inform the control room as 
well as the passenger regarding the loss of attention of the driver. 
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I. INTRODUCTION 

Long-distance driving with monotonous driving 
conditions often leads to drowsiness and mental 
fatigue in the driver. Sleep deprivation is another 
cause that leads to drowsiness and fatigue, which may 
result in road accidents and allied mishaps. Hence, it 
is necessary to monitor the drowsiness level of the 
driver and alarm him when required. Most existing 
solutions address the issue of estimating the 
drowsiness levels in drivers through a single cue. 
Some systems require specialized hardware, which 
limits their use for a general population. In many 
previous works, they have developed an image-based 
embedded plat- form for detecting drowsiness in 
automotive drivers solely based on eye closure rates, 
addressing the issues such as onboard illumination 
conditions, driver’s head motion, etc. This method 


provided significant accuracy as compared to similar 
methods which rely exclusively on upon PERCLOS. 
However, the reliability from a single image based 
the cue may not be extremely robust. This factor 
motivates us to employ the findings of Dhupati et al. 
and integrate speech signals along with PERCLOS to 
increase the efficacy of the system. Moreover, if the 
embedded platform is a Smartphone, the system may 
reach a broader mass, by just installing the 
application. The Smartphone has added advantages of 
using cellular data and networks for communication, 
thus enabling the system to send warning messages to 
control rooms. This communication feature will help 
cab owners to maintain a record of the drowsiness 
levels of their drivers and take necessary actions 
drivers. Such an implementation may assist every 
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single automotive vehicle as a preventive tool for 
accidents occurring due to drowsiness or fatigue. 


Il. LITERATURE REVIEW 

A prototype of the PERCLOS camera for measuring 
eye closure in a heavy truck environment is discussed 
in the paper ”A Drowsy Driver Detection System For 
Heavy Vehicle” [1], This eye tracking system was 
designed using CCD (charge coupled device) imaging 
technology along with a dedicated PC/104 with PCI 
Bus computer platform. The PERCLOS camera 
exploits the basic property of human eye in which the 
retina reflects different amounts of infra-red light at 
different frequencies. The computation is basically 
done with the help of the System’s algorithm. Prior to 
calling any program routines, the system must be 
initialized. It requires two equally illuminated images 
of the driver’s face which is captured by the two CCD 
cameras placed perpendicular to each other. Bright- 
Spot Segregation routine run to distinguish eye from 
noise once isolated, the retinal image sizes are 
measured, and the results are used to calculate 
PERCLOS Calculated height of each observation ina 
data file is stored on a hard disk. Some of the 
advantages of the above mentioned idea are that it can 
run in real time without operator intervention. Also 
the PERCLOS camera was able to eliminate many of 
these spurious reflections through the subtraction 
process. The main drawbacks of the mentioned paper 
are the increased amount of infra-red light going to 
the eye causes damage to the eye after prolonged 
exposure. Also there were problems experienced in 
Timing control and synchronization with the 2 CCD 
cameras. 


Localization of human faces in digital images is a 
fundamental step in the process of face recognition. 
The paper *Robust Face Detection Using the 
Hausdorff Distance” [2] presents a shape comparison 
approach to achieve fast, accurate face detection that 
is robust to changes in illumination and background. 
The proposed method in this paper is edge-based and 
works on grayscale still images. Hausdorff distance is 
used as a similarity measure between a general face 
model and possible instances of the object within the 
image. In this paper describes an_ efficient 
implementation, making this approach suitable for 
real-time applications. A two-step process that allows 
both coarse detection and exact localization of faces 
presented. Experiments were performed on a large 
test set and rated with a new validation measurement. 
Some advantages are the better localization results 
show that the system is ro- bust against different 
background conditions and changing illumination. 
Runtime behavior of this method allows the use in 
real-time video applications. The main drawback is 


the restrictions of the detection of only frontal views 
and single faces, on automatic model creation and on 
transformation parameter optimization. 


The paper ’The Study of Driver Fatigue Monitor 
Algorithm Combined PERCLOS and AECS” [3] 
discusses the idea of combining PERCLOS and 
AECS to introduce an algorithm to detect the driver 
fatigue. The algorithm, based on color image skin 
color segment can directly transform the RGB form 
image to the gradation image by the skin color 
segmentation, and then the eyes are detected. Then 
they identify eye’s condition through the judgment of 
eye area. PERCLOS and AECS algorithms are 
combined to detect the driver’s fatigue. It is 
emphasized that the color segmentation and the 
removal of the regional border connectivity are used 
to locate the eye. This method can’t be influenced by 
complex background, and it is suitable for faces of 
different skin color and guise. The goal of eye 
detection and tracking is for subsequent eyelid 
movement monitoring, gaze de- termination, face 
orientation estimation and facial expression analysis. 
A robust, accurate, and real-time eye tracker is 
therefore crucial. This method records the times of 
eyes were open and eyes were close, and the 
beginning and the ending time, and then it computes 
the value of PERCLOS. If the value is more than 40% 
and the time that eyes are in the closed state maintains 
more than three seconds, this algorithm will conclude 
that the driver is in the state of doze and this situation 
will be detected as fatigue driving. 


The paper Eye State Classification Based on Multi- 
feature fusion” [4] was proposed by Wenhui Dong, 
Peishu Qu Department of Physics, Dezhou 
University, Dezhou back in 2009. The State of the 
eye, open or closed contains a lot of information 
about the expressions and can be used in many Fields, 
such as driver fatigue detection, face expression 
analysis etc. After the characteristics of the infrared 
image of the eye was studied, the paper fuses four 
features together using fuzzy fusion to judge the eye 
state .This method can overcome the drawbacks 
suffered in the single feature classification methods 
and realize the complementary of information. Also, 
it can obtain a higher correct rate of classification. It 
was decided that the extraction of the features are iris 
area, eye height, eye area and eyelids curvature’s) 
Eye Area Extraction: After obtaining the infrared 
image of the eye, we see that the gray-level change is 
obvious in the image, so gray-scale distribution can 
be used to choose a right threshold to binarization and 
150 is selected as the threshold. ii) Eyelid Curvature 
Extraction: The up-eyelid also can be extracted from 
the binarization image.ili) Eye height Extraction: This 
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is done by scanning from the middle of the up-eyelid 
down to the end pixel which is not zero and 
calculating the distance between the two pixels. Iris 
Area Extraction: When the eye is completely open, 
the iris is similar to a circular. Multiplying the vertical 
distance and horizontal distance we obtain the area 
called the iris area. BP Network Establishment: After 
the four parameters exaction, the eye area and the 
eyelid-curvature are selected as a vector, the eye 
height and the iris area are selected as a vector. 


An idea which uses voice response analysis of human 
subjects for assessing their level of fatigue or 
drowsiness were proposed in the paper ”A Novel 
Drowsiness Detection Scheme Based On Speech 
Analysis With Validation Using Simultaneous EEG 
Recordings” [5]. The results are simultaneously 
validated through the Electroencephalography (EEG) 
based measurements. A 36 hour long experiment was 
done where the subjects are asked to repeat a 
particular sentence at different stages. The response 
time is analyzed for computing various parameters 
such as voiced duration, unvoiced duration, and the 
response time. They have used Mel Frequency- 
Cepstral-Coefficients (MFCC) as the features for the 
silence, voiced and unvoiced parts of speech. They 
have segregated these parts using a Gaussian Mixture 
Model (GMM) classifier. They have used short time 
analysis of speech because the acoustic properties of 
the speech changes continuously during an utterance. 
The unvoiced and voiced speech has been extracted 
with the help of MFCC features followed by a GMM 
classifier. The ratio of voiced to unvoiced speech has 
been calculated and was found out that the ratio 
decreases during the successive stages of the 
experiment when the driver starts to feel drowsy. This 
analysis can be used to make conclusions that the 
vocal tract offers more constriction to airflow from 
the lungs as the level of fatigue increases for which 
the unvoiced speech duration gets affected. The 
response time was also observed to change with 
fatigue but it was found to be unsuitable for fatigue 
detection because of large deviation. 


The paper “A PERCLOS-based Driver Fatigue 
Recognition Application for Smart Vehicle Space” 
[6], they have considered PERCLOS to evaluate 
driving fatigue status by measuring the proportion of 
eyes closed in a certain period of time or interval of 
time and the continued closure time. PERCLOS is the 
percentage of eyelid closure over the pupil over time 
and reacts slow eyelid closures rather than blinks. A 
PERCLOS drowsiness metric was established in a 
1994 driving simulator study as the proportion of time 
in a minute that the eyes are at least 80 percent 
closed. In which PERCLOS = frames of the eyes 


closed / (frames of eyes open + frames of eyes 
closed) x 100 percentage. Smart vehicle space 
software platform uses a smart space oriented context 
aware system framework-S2CAS :The first layer, 
called the original context in- formation perception 
layer, is responsible for collecting all kinds of smart 
vehicle space dynamic context, such as latitude and 
longitude data, car bus data, facial features data, voice 
data, various sensor data and electronic maps, the user 
information database, configuration files, and virtual 
context the second layer, smart vehicle space 
SCUDW re, is responsible for shielding the 
underlying differences of various devices and the 
structure of sensors, analyzing and reasoning the data 
from the original context information perception 
layer, managing the equipment and _ objects, 
integrating various functional modules, providing for 
the application run time environment; the third layer 
is applications and services layer, which contains 
human-computer interaction interface, to provide 
users with a variety of car-based applications and 
services, such as safety tips, navigation tips, 
information services, entertainment services, in-car 
environment control and so on. 


The paper ”Analysis of Training Parameters for 
Classifiers Based on Haar-like Features to Detect 
Human Faces” [7], analyzes the performance of Haar- 
like feature based classifier for detection of face with 
fewer features. In lower dimensional feature space 
representation of the image might reduce the 
computational burden compromising the accuracy in 
detection of faces with varying orientations. In this 
work they train the classifier with positive instances 
of different orientations under such feature constraint. 
Training parameters like maximum deviation and 
maximum angle are varied to form different 
classifiers. In experimental results show optimum 
values of the design parameters can produce good 
performance of the classifier to detect tilted human 
faces. Haar-like features are generally used to detect 
and recognize objects. A Haar-like feature considers 
the rectangular regions at a specific location within a 
detection window. Then the intensities of pixels in 
these regions are summed. Finally calculates the 
difference between these regions. Some advantages 
are the classifier detects the frontal faces with high 
accuracy and optimum values of the design 
parameters can train a classifier to provide good 
performance for detection frontal and tilted human 
faces. The main drawbacks are, best performance 
achieved only at moderate values of the maximum 
deviation and maximum angle and cannot choose the 
maximum angle and maximum deviation arbitrarily 
high or low. 
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Another method is proposed in the paper, ’Driver 
Alertness Monitoring Using Fusion of Facial Features 
and Bio-Signals” [8]. It measures driver’s tiredness 
using two distinct methods: eye movement 
monitoring and bio-signal processing. A monitoring 
system is designed in Android-based Smartphone 
where it receives sensory data via wireless sensor 
network and further processes the data to indicate the 
current driving aptitude of the driver. It is critical that 
several sensors are integrated and synchronized for a 
more realistic evaluation of the driver behavior. The 
sensors applied include a video sensor to capture the 
driver image and a bio-signal sensor to gather the 
driver photoplethysmograph (PPG)  signal.PPG 
sensors use a light-based technology to sense the rate 
of blood flow as controlled by the heart’s pumping 
action.PPG is not a complicated nor expensive optical 
measurement method that is often used for heart rate 
monitoring purposes. PPG is a _ non-invasive 
technology that uses a light source and a photo 
detector at the surface of skin to measure the 
volumetric variations of blood — circulation. 
Relaxation, extreme fatigue and drowsiness episodes 
can be measured non-invasively from the Pulse Rate 
Variability (PRV) signal obtained from 
photoplethysmography signal (PPG). 


A dynamic Bayesian network paradigm is applied for 
fatigue analysis. DBN paradigm is a probabilistic 
graphical model which uses different mathematical 
methods to model an object based on the given input 
data. The foremost reason of adapting DBN is that its 
ability to integrate distinct categories of parameters 
even the extraction methods, measurement 
techniques, and etc. of those parameters are different. 
The proposed system detects biological variation with 
very high accuracy . Video and biological sensors are 
integrated for a more realistic and accurate evaluation 
of the driver behavior. PPG works best in optimum 
lighting conditions; bad lighting may affect the 
accuracy of the pulse oxymeter, since it works on the 
amount of light reflected from skin. The PPG 
measures vary if the subject has any heart related or 
other such illness since, when the PRV of the subject 
is calculated, if the subject has heart problems, the 
pulse will very irrespective of his/her drowsiness 
levels, 


The paper ’Monitoring Driver’s Drowsiness Status at 
Night Based on Computer Vision” [9] basically deals 
with Drivers drowsiness and fatigue decreases the 
vehicle management skills of a driver. The operator 
driving vehicle in night has become a significant 
downside today. Driver in a drowsiness state is the 
one among the important reason of increasing amount 
of road accidents and death. Hence the drowsiness 


detection of driver is considering as most active 
research field. Many ways are created recently to 
detect the drowsiness of driver. Existing methods can 
be classified in three categories based on 
physiological measures, performance measures of 
vehicles and ocular measures. Few ways are intrusive 
and distract the driver from comfortable driving. 
Some of the methods need expensive sensors for 
information handling. Therefore, a low cost, real time 
system to detect the driver’s drowsiness is developed 
in this paper. In this proposed system, real time video 
of driver records using a digital camera. Using some 
image processing techniques, face of the driver is 
detected in each frame of video. Facial landmarks 
points on the driver’s face is localized using one 
shape predictor and calculating eye aspect ratio, 
mouth opening ratio, yawning frequency 
subsequently. Drowsiness is detected based on the 
values of these parameters. Adaptive thresholding 
method is used to set the thresholds. Machine 
learning algorithms were also implemented in an 
offline manner. Proposed system tested on the Face 
Dataset and also tested in real-time. The experimental 
results shows that the system is accurate and robust. 


Il. PROPOSED METHODOLOGY 

This proposal use three-stage drowsiness detection. 
The first stage uses the percentage of eyelid closure 
(PERCLOS) obtained through images captured by the 
front camera with a modified eye state classification 
method. The system uses near infrared lighting for 
illuminating the face of the driver during night 
driving. The second step uses the voiced to the 
unvoiced ratio obtained from the speech data from the 
microphone, in the event PERCLOS crosses the 
threshold. A final verification stage is used as a touch 
response within a stipulated time to declare the driver 
as drowsy and subsequently sound an alarm. The 
device maintains a log file of the periodic events of 
the metrics along with the corresponding GPS 
coordinates. PERCLOS is a tiredness metric, based 
on eye close and open rates. It has been proved as a 
significant marker of drowsiness. PERCLOS can be 
defined as the approx proportion of time in which the 
eyelids are at least 80% closed over the pupil. A value 
of PERCLOS above the threshold(80%) indicates 
higher drowsiness level and vice versa. Voiced and 
unvoiced proportion is carried out with the help of a 
support vector machine (SVM) with the Mel 
frequency cepstral coefficients (MFCC) as its 
features. MFCC represents the short-term power 
spectrum of the speech signal comprises of the voiced 
and unvoiced bits. 


Support Vector Machines (SVMs) are trained or 
supervised learning methods which helps for 
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regression and classification tasks that is originated 
from statistical or mathematical learning theory. As a 
classification method, SVM is a general or global 
classification model that develops no overlapping 
divisions or partitions and usually employs all the 
attributes. Support Vector Machines (SVMs) are 
trained or supervised learning methods which helps 
for regression and classification tasks that is 
originated from statistical or mathematical learning 
theory. As a classification method, SVM is a general 
or global classification model that develops non- 
overlapping divisions or partitions and usually 
employs all the attributes. 


Fig 1 Flowchart of Proposed System 
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The PERCLOS value of a subject is calculated as 
P= Ec / (Eo + Ec) * 100 (1) 


Here, Ec and Eo shows the counts of closed and open 
eyes for a predefined interval. A higher value of P 
shows there is higher drowsiness level and vice versa. 
The steps in the calculation consists of calculating the 
PERCLOS from an image batch involves the face 
detection followed by eye localization and eye state 
classification. Real-life driving have its own 
challenges, where both the illumination and the head 
angle or pose of the subject or driver is a problem to 
be considered. We try to solve this problem by 
preprocessing the images by geometric and 
photometric correction methods. Photometric 
Correction: Let the input image be I of size N x M. 
The image is subdivided into blocks of size Nh x M 
w. We hence obtain a total number of h x w boxes, 
obtained as: 


h=N/Nh (2) 


This process elevates the local contrast and increases 
the details in the image. What so ever, there are some 
kind of blocking constraints at the boundaries of the 
corrected sub images, when concatenated to form. A 
smoothing process[10] by a 5x5 Gaussian filter 
solves this particular problem. 








Geometric Correction: This correction is done in the 
event where the driver’s face has a tilt or deviation of 
more than +30 degrees or -30 degrees from the 
vertical straight face. An affine rotation of the pixels 
of the pre-enhanced image helps to develop a new 
image which geometrically sets the angle of the face 
in an upright position. 


Face and Eye Detection: It is clear that for the 
accurate or exact estimation of PERCLOS, fast and 
correct location of the eyes are necessary. For 
localizing or locking the eyes, we first locate the face 
region[11] from the pre-enhanced image. This step 
not only reduces the search space for eye location but 
also reduces the false alarms in locating the eye or in 
the detection stage. We use a classifier based on 
Haar-like features for face which are trained with 
optimal parameters. From the detected face region, 
we search for the eyes in the upper half of the face 
region. We have employed a Haar classifier trained 
with eye images. Two classifiers are trained - one for 
visible image during daytime driving and the other for 
NIR images during nighttime driving. 


Eye State Classification: For the accurate estimation 
of PERCLOS, the localized eye region needs to be 
accurately classified into opened or closed states. A 
new set of features based on the fusion of information 
of edges and their orientations is proposed here. 


A. Gradient Image Computation: We first obtain sub 
images E i, j of size 8 x 8 from the eye image E 
with a 50 % overlap in both the directions. Each 
sub image E i, j is passed through the Sobel 
operators and gradient images are obtained. 


B. Orientation Computation: The edge-maps are 
exposed to gradient operation to find the oriented 
gradients. 


C. Feature Computation: The selected features 
perform better than other competing features such 
as edge orientation histograms, scale-invariant 
feature transform descriptors, because they use 
overlapping local contrast normalization to obtain 
improved accuracy[12]. 


D. Classification: We train a linear SVM with 1200 
images of which 700 are open and 500 are closed 
eye images taken from the database created using 
normal and NIR illumination. 


E. PERCLOS Computation: Once the eyes are 
classified as open or closed, the algorithm 
computes the PERCLOS value using over a 
sliding time window of 10 second duration. 


A fusion of additional cues such as speech signals can 
make the drowsiness detection sys- tem more reliable. 
The voiced unvoiced ratio (VUR) of speech signals 
has been validated, as an indicator of drowsiness. The 
inbuilt microphone of the Smartphone to capture 
speech signals sampled at 20 kHz, since speech 
information is up to about 7.5 kHz is used here. This 
speech data is processed frame-wise. The vocal fold 
vibrations may be assumed periodic if the signal is of 
short duration (10-30ms). For this reason, the speech 
data is processed in small frames of size 10-30 ms. 
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Singular Value Decomposition (SVD) is performed 
frame wise to remove noise and_ redundant 
information. Voiced speech is produced by the 
vibrations of vocal cords whereas unvoiced sounds 
are due to the turbulence of air in vocal tract (mouth, 
tongue, velum, etc.).Unvoiced speech has lower 
energy and higher zero crossing rates as compared to 
the voiced speech. Voiced and unvoiced classification 
is carried out using a support vector machine (SVM) 
with the Mel frequency cepstral coefficients (MFCC) 
as features. MFCC represents the short term power 
spectrum of the speech signal. Once the MFCCs are 
obtained, the SVM returns the voiced speech v s (n) 
and unvoiced speech u s (n)f lengths N v and N u 
respectively. The VUR is finally obtained as the ratio 
of the energies. 


The third and final verification stage of the 
framework is the touch-based reaction. In this stage, 
the driver is asked to touch the Smartphone screen 
within a stipulated time of 10 second after a voice 
instruction asks to do so. This stage is invoked when 
both the voice and vision based classification 
methods predict the driver to be drowsy. In the event 
the driver fails to respond within 10s, the final 
decision is drowsy, and the alarming sound is 
generated through the speakers. Along with this, an 
SMS along with the GPS location of the driver is sent 
to an emergency number which was obtained at the 
start of the ride. A vibration hardware is worn around 
the hand like a band which vibrates, awaking the 
driver. The event is also marked in the log file stored 
in the internal memory of the Smartphone. 


Fig 2 Detailed Procedure of Proposed System 
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IV. CONCLUSION 

We have proposed a Smartphone-based drowsiness 
detection solution for automotive drivers, which 
undergoes a three-stage verification process to detect 
the drowsiness in drivers. The measures are 
PERCLOS, VUR, and a reaction test response of the 
driver on the smartphone screen along with an alert 
system which produces a beep sound, vibrates the 
driver arm by a hardware worn by the driver thereby 
alerting the driver. Each stage is activated based on 
the decision of the preceding stage. The system 
maintains a register log which marks the events when 
the driver was found to be drowsy based on the 
PERCLOS, VUR and touch response. The application 
has an option to upload the log file to a cloud server 
to maintain a record. This option will be useful to cab 
service providers, who can keep their records based 
on driver performance. We have tested the sub- 
operations such as the eye state classification, 
PERCLOS, and VUR-based drowsiness | state 
classification individually as well as with the 
combined measures and _ cross-correlated _ the 
estimated cues against standard cues. The device may 
be suitably modified to monitor the loss of attention 
of any person engaged in a critical safety operation. 
With improvements in the acquisition frame-rates of 
the front camera, fast ocular motions such as eye 
movements may be captured, which can provide 
earlier indications of the onset of fatigue. An 
extension of this work may be tracking the road 
conditions using the primary camera along with the 
drivers drowsiness level in parallel. However, such an 
implementation would require a lot of multi-threading 
operations. 
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